Using Document Structure on Retrieving Webpages at the Web-CLEF 2006
نویسندگان
چکیده
We present a report on our participation in the mixed monolingual web task of the 2006 Cross-Language Evaluation Forum (CLEF). We compared the result of web page retrieval based on the page content, page title, and anchor page. The retrieval effectiveness for the combination of page content, page title, and anchor texts was better than that of the combination of page title and page title only. Applying the pseudo-relevance feedback improved the retrieval performance of the queries.
منابع مشابه
University of Glasgow at CLEF 2013: Experiments in eHealth Task 3 with Terrier
In our participation in the CLEF 2013 eHealth task 3, we investigate (1) the effectiveness of our Divergence from Randomness (DFR) framework on retrieving medical webpages, (2) the adoption of classical pseudo-relevance feedback for improving the representation of the queries, and (3) the exploitation of a collection enrichment technique for alleviating the mismatches between the terms in docum...
متن کاملKnowledge extraction from webpages
This article presents a system to extract Knowledge from webpages by producing semantic annotations. taking into account semantic information from the domain to annotate an element in a webpage implies solving two problems : (1) identifying the syntactic structure of this element in the webpage and (2) identifying the most specific concept (in terms of subsumption) of the ontology that will be ...
متن کاملQuery-Structure Based Web Page Indexing
Indexing is a crucial technique for dealing with the massive amount of data present on the web. In our third participation in the web track at TREC 2012, we explore the idea of building an efficient query-based indexing system over Web page collection. Our prototype explores the trends in user queries and consequently indexes texts using particular attributes available in the documents. This pa...
متن کاملLIMSI @ CLEF eHealth 2015 - task 2
This paper presents LIMSI’s participation in the User-Centered Health Information Retrieval task (task 2) at the CLEF eHealth 2015 workshop[5]. In our contribution we explored two different strategies to query expansion, i.e. one based on entity recognition using MetaMap[1] and the UMLS[3], and a second strategy based on disease hypothesis generation using self-constructed external resources su...
متن کاملCollecting and Organizing Web Content
To collect and organize Web content today a user must make bookmarks, print whole webpages, or copy and paste pieces of webpages into a document. We present a framework for assisting the user in managing personal collections of Web content. The user interactively selects the webpage elements of interest, and the system builds an extraction pattern for those elements that is used to automaticall...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006